AITopics | trust region-guided proximal policy optimization

Trust Region-Guided Proximal Policy Optimization

Neural Information Processing SystemsFeb-6-2026, 11:28:27 GMT

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well. Extensive experiments verify the advantage of the proposed method.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Reviews: Trust Region-Guided Proximal Policy Optimization

Neural Information Processing SystemsJan-26-2025, 06:33:09 GMT

The paper proposes to adapt the clipping procedure of Proximal Policy Optimization (PPO) such that the lower and upper bounds are no longer constant for all states. The authors show that constant bounds cause convergence to suboptimal policies if the initial policy is initialized poorly (e.g. the probability of choosing optimal actions is small). As an alternative, the authors propose to compute state-action specific lower and upper bounds that are inside the trust region with respect to the previous policy. If the previous policy assigns a small probability to a given action, the lower and upper bounds do not need to be very tight, allowing for less agressive clipping. The adapted version of PPO, which the authors call TRGPPO, has provably better performance bounds than PPO and is validated empirically in several experiments.

algorithm, proximal policy optimization, trust region-guided proximal policy optimization, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Trust Region-Guided Proximal Policy Optimization

Neural Information Processing SystemsOct-10-2024, 16:39:10 GMT

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well. Extensive experiments verify the advantage of the proposed method.

ppo, trust region-guided proximal policy optimization

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Trust Region-Guided Proximal Policy Optimization

Wang, Yuhui, He, Hao, Tan, Xiaoyang, Gan, Yaozhong

Neural Information Processing SystemsMar-18-2020, 20:32:24 GMT

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well.

ppo, trust region-guided proximal policy optimization

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Filters

Collaborating Authors

trust region-guided proximal policy optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Trust Region-Guided Proximal Policy Optimization

Reviews: Trust Region-Guided Proximal Policy Optimization

Trust Region-Guided Proximal Policy Optimization

Trust Region-Guided Proximal Policy Optimization